1 City



PHOTO BY TYLER MERBLER [1]
Los Angeles, CA has a population of 3.98M people with a median age of 35.6, a median household income of $54,432.
1,741,311 total jobs with 179,755 tech jobs - 9.7% of total jobs

2 County



PHOTO BY PAOLO GAMBA
Los Angeles County, CA has a population of 10.1M people, with a median age of 36.3, a median household income of $61,338.
4,550,399 total jobs with 491,128 tech jobs - 9.3% of total jobs

3 Should You Become a Data Scientist?


There are tons of jobs, high Salary and much growth. IBM Predicts Demand For Data Scientists Will Soar 28% By 2020. [3]. Let’s check the facts:

4 Employment Projections


The median annual wage for computer and information technology occupations was $84,580 in May 2017, which was more than double the median annual wage for all occupations of $37,690.[4]


Employment of computer and information technology occupations is projected to grow 13 percent from 2016 to 2026 in the US, faster than the average for all occupations.


These occupations are projected to add about 607k new jobs.

5 Dinamic Visualizations


Tech employment data for Los Angeles city and county from 2005 to 2017.[5]

6 Dot Density Map

Shows a high density of tech jobs in West LA, West Hollywood, Burbank, Westlake and Pasadena areas in 2017 [6] [7] [8]
Each dot = 10 tech jobs

7 Interactive Zip Code Map


8 Correlation Between the Education and Income


## Warning: Removed 7 rows containing non-finite values (stat_smooth).
## Warning: Removed 7 rows containing missing values (geom_point).

## Warning: Removed 7 rows containing missing values (geom_point).
## Warning: Removed 7 rows containing missing values (geom_text_repel).

Association between the 2012 CHCI [9] and the 2016 median household income [10] of 282 zip codes in L.A. County. We can see a very strong correlation correlation between the human capital level [23] and the household income for the zip code in L.A. It demonstrates the importance to invest in education (chci) in order to create more productive workforce and more high paying jobs.

################

## Need to clean code bellow

# str(df.17)
# df.17$GEOID <- as.factor(df.17$GEOID)
# names(masterzip) <- c("GEOID", "name")
# z <- inner_join(df.17, masterzip, by = "GEOID")
# str(z)
# top_zip <- z %>% arrange(desc(per)) %>% top_n(20, wt=per)
# 
# names(top_zip) <- c("zip", "V1",    "year",  "tech",  "info",  "prof",  "per",   "total", "name")
# str(cor)
# str(top_zip)
# top_zip$zip <- as.numeric(top_zip$zip)
# 
# cor <- inner_join(top_zip, cor, by="zip")
################
# NEW

# Create the scatter plot with regression line
# 
# p7 <- ggplot(cor, aes(x=chci, y=estimate)) +
#   geom_point(colour = "blue", size=4) +
#   labs(title = "Fig. 8 ",
#        x = "City Human Capital Index", y = "Median Household Income ($)") +
#   scale_x_continuous(limits = c(85, 175)) +
#     geom_smooth(colour = "red", method=lm,   # Add linear regression line
#               se=FALSE)    # Don't add shaded confidence region
# p7

# p1 <- ggplot(cor, aes(x=chci, y=log(estimate)))+
#   geom_point(colour = "red")
# p1
# p2 <- p1 +
#   geom_smooth(mapping = aes(linetype = "r2"),
#               method = "lm",
#               formula = y ~ x , se = FALSE,
#               color = "black")
# p2 + geom_point()
# p2 + geom_point(shape = 1, size = 2) # Changes to shape = 1
# p3=p2 + geom_point(shape = 1, size = 3, stroke=1.5)
# p3
# # Labels overlap - use ggrepel
# # install.packages("ggrepel") 
# library("ggrepel") # Repel overlapping text labels away from each other.
# p4 <- p3 + geom_text_repel(aes(label = zip), size=2) # Uses greppel text repel
# p4



# Adds Title / Lables
# p5 = p4+
# scale_x_continuous(name = "Capital Human City Index")+
# scale_y_continuous(name = "Log Income, 2016") +
# ggtitle("CHCI and Median Income Per Zip Code")
# p5
# 
# summary(cor)
# chci media 163

################
# str(df.17)
# df.17$GEOID <- as.factor(df.17$GEOID)
# names(masterzip) <- c("GEOID", "name")
# z <- inner_join(df.17, masterzip, by = "GEOID")
# str(z)
# top_zip <- z %>% arrange(desc(per)) %>% top_n(30, wt=per)
# str(cor)
# str(top_zip)


# # Creates vector with list of top tech zip codes
# top_zip$GEOID -> zip.label
# 
# # Only shows labels from list created
# 
# library(ggplot2)
# library(readxl) 
# library(tidyverse) # Includes ggplot2
# library(dplyr)
# 
# p6 = p5 + geom_text(aes(label = zip),
#             color = "gray20", data = filter(cor, cor$zip %in% zip.label))
# p6
# 
# 
# p4 = p3 + geom_text(aes(label = Country),
#             color = "gray20", data = filter(cpigdp, Country %in% country.label))

9 Conclusions

How to Become a Data Scientist in 8 Easy Steps [11]

  1. Get good at stats, math and machine learning
  2. Learn to code
  3. Understand databases
  4. Master data munging, visualization and reporting
  5. Level up with Big Data
  6. Experience, practice and meet fellow data scientists
  7. Internship, bootcamp or get a job
  8. Follow and engage with the community

10 References

  1. [1] PHOTO BY TYLER MERBLER - https://www.flickr.com/photos/37527185@N05/13934368189/
  2. [2] PHOTO BY PAOLO GAMBA - https://www.flickr.com/photos/abukij/18012989253/
  3. [3] IBM Predicts Demand For Data Scientists- https://www.forbes.com/sites/unicefusa/2018/11/20/a-day-to-celebrate-defend-and-promote-child-rights/#74c6ac42e4d4
  4. [4] Employment Projections - Data from the Bureau of Labor Statistics - https://www.bls.gov/emp/tables/emp-by-major-occupational-group.htm
  5. [5] Tech industry employment was calculated using the following: (1) Information jobs (NAICS: 51), and (2) Professional, Scientific, & Technical Skills (NAICS: 54). Data source: Quarterly Census of Employment and Wages (QCEW) developed through a cooperative program between the states and the U.S. Bureau of Labor Statistics. These data are summarized by Industry Sector (2-digit NAICS).
  6. [6] Dowloaded shape files for zip code boundaries from the County Data Portal https://data.lacounty.gov/Geospatial/ZIP-Codes/65v5-jw9f
  7. [7] Downloaded Zip Code List from - http://file.lacounty.gov/SDSInter/lac/1031552_MasterZipCodes.pdf
  8. [8] Inspired by How to Make Dot Density Maps in R - Nathan Yau - https://flowingdata.com/2014/08/28/how-to-make-dot-density-maps-in-r/
  9. [9] City Human Capital Index - http://www.anderson.ucla.edu/centers/ucla-anderson-forecast/projects-and-partnerships/city-human-capital-index
  10. [10] Median income by zip code data from the American Community Survey obtained using the tidycensus R package. Request API at https://api.census.gov/data/key_signup.html
  11. [11] How to become a data scientist in 8 easy steps. - https://insidebigdata.com/2014/11/14/become-data-scientist-8-easy-steps/

11 Acknowledgements